Cross-lingual sentiment classification: Similarity discovery plus training data adjustment
نویسندگان
چکیده
The performance of cross-lingual sentiment classification is sharply limited by the language gap, which means that each language has its own ways to express sentiments. Many methods have been designed to transmit sentiment information across languages by making use of machine translation, parallel corpora, auxiliary unlabeled samples and other resources. In this paper, a new approach is proposed based on the selection of training data, where labeled samples highly similar to the target language are put into the training set. The refined training samples are used to build up an effective cross-lingual sentiment classifier focusing on the target language. The proposed approach contains two major strategies: the alignedtranslation topic model and the semi-supervised training data adjustment. The aligned-translation topic model provides a cross-language representation space in which the semi-supervised training data adjustment procedure attempts to select effective training samples to eliminate the negative influence of the semantic distribution differences between the original and target languages. The experiments show that the proposed approach is feasible for cross-language sentiment classification tasks and provides insight into the semantic relationship between two different languages. © 2016 Elsevier B.V. All rights reserved.
منابع مشابه
Co-Training for Cross-Lingual Sentiment Classification
The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine tr...
متن کاملActive Learning for Cross-Lingual Sentiment Classification
Cross-lingual sentiment classification aims to predict the sentiment orientation of a text in a language (named as the target language) with the help of the resources from another language (named as the source language). However, current cross-lingual performance is normally far away from satisfaction due to the huge difference in linguistic expression and social culture. In this paper, we sugg...
متن کاملBilingual Co-Training for Sentiment Classification of Chinese Product Reviews
The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic meth...
متن کاملA Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data
Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To addres...
متن کاملCross Lingual Sentiment Analysis using Modified BRAE
Cross-Lingual Learning provides a mechanism to adapt NLP tools available for label rich languages to achieve similar tasks for label-scarce languages. An efficient cross-lingual tool significantly reduces the cost and effort required to manually annotate data. In this paper, we use the Recursive Autoencoder architecture to develop a Cross Lingual Sentiment Analysis (CLSA) tool using sentence al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 107 شماره
صفحات -
تاریخ انتشار 2016